EN FR
EN FR
ROMA - 2019
New Software and Platforms
Bilateral Contracts and Grants with Industry
Bibliography
New Software and Platforms
Bilateral Contracts and Grants with Industry
Bibliography


Section: New Results

High performance tensor–vector multiplication on shared-memory systems

Tensor–vector multiplication is one of the core components in tensor computations. We have recently investigated high performance, single core implementation of this bandwidth-bound operation. Here, we investigate its efficient, shared-memory implementations. Upon carefully analyzing the design space, we implement a number of alternatives using OpenMP and compare them experimentally. Experimental results on up to 8 socket systems show near peak performance for the proposed algorithms.

This work appears in the proceedings of PPAM2019 and is supported with a technical report [22], [36].